26 research outputs found
Planning for Decentralized Control of Multiple Robots Under Uncertainty
We describe a probabilistic framework for synthesizing control policies for
general multi-robot systems, given environment and sensor models and a cost
function. Decentralized, partially observable Markov decision processes
(Dec-POMDPs) are a general model of decision processes where a team of agents
must cooperate to optimize some objective (specified by a shared reward or cost
function) in the presence of uncertainty, but where communication limitations
mean that the agents cannot share their state, so execution must proceed in a
decentralized fashion. While Dec-POMDPs are typically intractable to solve for
real-world problems, recent research on the use of macro-actions in Dec-POMDPs
has significantly increased the size of problem that can be practically solved
as a Dec-POMDP. We describe this general model, and show how, in contrast to
most existing methods that are specialized to a particular problem class, it
can synthesize control policies that use whatever opportunities for
coordination are present in the problem, while balancing off uncertainty in
outcomes, sensor information, and information about other agents. We use three
variations on a warehouse task to show that a single planner of this type can
generate cooperative behavior using task allocation, direct communication, and
signaling, as appropriate
Deep Radial-Basis Value Functions for Continuous Control
A core operation in reinforcement learning (RL) is finding an action that is
optimal with respect to a learned value function. This operation is often
challenging when the learned value function takes continuous actions as input.
We introduce deep radial-basis value functions (RBVFs): value functions learned
using a deep network with a radial-basis function (RBF) output layer. We show
that the maximum action-value with respect to a deep RBVF can be approximated
easily and accurately. Moreover, deep RBVFs can represent any true value
function owing to their support for universal function approximation. We extend
the standard DQN algorithm to continuous control by endowing the agent with a
deep RBVF. We show that the resultant agent, called RBF-DQN, significantly
outperforms value-function-only baselines, and is competitive with
state-of-the-art actor-critic algorithms.Comment: In Proceedings of the 35th AAAI Conference on Artificial Intelligence
(AAAI
Perceptual Context in Cognitive Hierarchies
Cognition does not only depend on bottom-up sensor feature abstraction, but
also relies on contextual information being passed top-down. Context is higher
level information that helps to predict belief states at lower levels. The main
contribution of this paper is to provide a formalisation of perceptual context
and its integration into a new process model for cognitive hierarchies. Several
simple instantiations of a cognitive hierarchy are used to illustrate the role
of context. Notably, we demonstrate the use context in a novel approach to
visually track the pose of rigid objects with just a 2D camera
Correction: Konidaris et al. Dating of the Lower Pleistocene Vertebrate Site of Tsiotra Vryssi (Mygdonia Basin, Greece): Biochronology, Magnetostratigraphy, and Cosmogenic Radionuclides. Quaternary 2021, 4, 1
Background and scope: The late Villafranchian large mammal age (~2.0–1.2 Ma) of the Early Pleistocene is a crucial interval of time for mammal/hominin migrations and faunal turnovers in western Eurasia. However, an accurate chronological framework for the Balkans and adjacent territories is still missing, preventing pan-European biogeographic correlations and schemes. In this article, we report the first detailed chronological scheme for the late Villafranchian of southeastern Europe through a comprehensive and multidisciplinary dating approach (biochronology, magnetostratigraphy, and cosmogenic radionuclides) of the recently discovered Lower Pleistocene vertebrate site Tsiotra Vryssi (TSR) in the Mygdonia Basin, Greece. Results: The minimum burial ages (1.88 ± 0.16 Ma, 2.10 ± 0.18 Ma, and 1.98 ± 0.18 Ma) provided by the method of cosmogenic radionuclides indicate that the normal magnetic polarity identified below the fossiliferous layer correlates to the Olduvai subchron (1.95–1.78 Ma; C2n). Therefore, an age younger than 1.78 Ma is indicated for the fossiliferous layer, which was deposited during reverse polarity chron C1r. These results are in agreement with the biochronological data, which further point to an upper age limit at ~1.5 Ma. Overall, an age between 1.78 and ~1.5 Ma (i.e., within the first part of the late Villafranchian) is proposed for the TSR fauna. Conclusions: Our results not only provide age constraints for the local mammal faunal succession, thus allowing for a better understanding of faunal changes within the same sedimentary basin, but also contribute to improving correlations on a broader scale, leading to more accurate biogeographic, palaeoecological, and taphonomic interpretations
A Domain-Agnostic Approach for Characterization of Lifelong Learning Systems
Despite the advancement of machine learning techniques in recent years,
state-of-the-art systems lack robustness to "real world" events, where the
input distributions and tasks encountered by the deployed systems will not be
limited to the original training context, and systems will instead need to
adapt to novel distributions and tasks while deployed. This critical gap may be
addressed through the development of "Lifelong Learning" systems that are
capable of 1) Continuous Learning, 2) Transfer and Adaptation, and 3)
Scalability. Unfortunately, efforts to improve these capabilities are typically
treated as distinct areas of research that are assessed independently, without
regard to the impact of each separate capability on other aspects of the
system. We instead propose a holistic approach, using a suite of metrics and an
evaluation framework to assess Lifelong Learning in a principled way that is
agnostic to specific domains or system techniques. Through five case studies,
we show that this suite of metrics can inform the development of varied and
complex Lifelong Learning systems. We highlight how the proposed suite of
metrics quantifies performance trade-offs present during Lifelong Learning
system development - both the widely discussed Stability-Plasticity dilemma and
the newly proposed relationship between Sample Efficient and Robust Learning.
Further, we make recommendations for the formulation and use of metrics to
guide the continuing development of Lifelong Learning systems and assess their
progress in the future.Comment: To appear in Neural Network
Autonomous Robot Skill Acquisition
Among the most impressive of aspects of human intelligence is skill acquisition—the ability to identify important behavioral components, retain them as skills, refine them through practice, and apply them in new task contexts. Skill acquisition underlies both our ability to choose to spend time and effort to specialize at particular tasks, and our ability to collect and exploit previous experience to become able to solve harder and harder problems over time with less and less cognitive effort.
Hierarchical reinforcement learning provides a theoretical basis for skill acquisition, including principled methods for learning new skills and deploying them during problem solving. However, existing work focuses largely on small, discrete problems. This dissertation addresses the question of how we scale such methods up to high-dimensional, continuous domains, in order to design robots that are able to acquire skills autonomously. This presents three major challenges; we introduce novel methods addressing each of these challenges.
First, how does an agent operating in a continuous environment discover skills? Although the literature contains several methods for skill discovery in discrete environments, it offers none for the general continuous case. We introduce skill chaining, a general skill discovery method for continuous domains. Skill chaining incrementally builds a skill tree that allows an agent to reach a solution state from any of its start states by executing a sequence (or chain) of acquired skills. We empirically demonstrate that skill chaining can improve performance over monolithic policy learning in the Pinball domain, a challenging dynamic and continuous reinforcement learning problem.
Second, how do we scale up to high-dimensional state spaces? While learning in relatively small domains is generally feasible, it becomes exponentially harder as the number of state variables grows. We introduce abstraction selection, an efficient algorithm for selecting skill-specific, compact representations from a library of available representations when creating a new skill. Abstraction selection can be combined with skill chaining to solve hard tasks by breaking them up into chains of skills, each defined using an appropriate abstraction. We show that abstraction selection selects an appropriate representation for a new skill using very little sample data, and that this leads to significant performance improvements in the Continuous Playroom, a relatively high-dimensional reinforcement learning problem.
Finally, how do we obtain good initial policies? The amount of experience required to learn a reasonable policy from scratch in most interesting domains is unrealistic for robots operating in the real world. We introduce CST, an algorithm for rapidly constructing skill trees (with appropriate abstractions) from sample trajectories obtained via human demonstration, a feedback controller, or a planner. We use CST to construct skill trees from human demonstration in the Pinball domain, and to extract a sequence of low-dimensional skills from demonstration trajectories on a mobile robot. The resulting skills can be reliably reproduced using a small number of example trajectories.
Finally, these techniques are applied to build a mobile robot control system for the uBot-5, resulting in a mobile robot that is able to acquire skills autonomously. We demonstrate that this system is able to use skills acquired in one problem to more quickly solve a new problem
Axial Line Placement in Deformed Urban Grids
The problem of placing axial lines in configurations of convex, non-overlapping polygons originates in the technique of space syntax analysis, which is used in town planning to describe and analyse architectural structures. Unfortunately, the general problem has been found to be NP-Complete, because of the possibility of configurations in which local choices have to be made, which affect the global optimality of the solution. Because of this, previous research has focused either on finding special cases where an exact solution can be obtained in polynomial time, or heuristic algorithms where approximate solutions can be found in polynomial time. Recently
Intrinsically Motivated Reinforcement Learning: A Promising Framework For Developmental Robot Learning
One of the primary challenges of developmental robotics is the question of how to learn and represent increasingly complex behavior in a self-motivated, open-ended way. Barto, Singh, and Chentanez (Barto, Singh, & Chentanez 2004; Singh, Barto, & Chentanez 2004) have recently presented an algorithm for intrinsically motivated reinforcement learning that strives to achieve broad competence in an environment in a tasknonspecific manner by incorporating internal reward to build a hierarchical collection of skills. This paper suggests that with its emphasis on task-general, self-motivated, and hierarchical learning, intrinsically motivated reinforcement learning is an obvious choice for organizing behavior in developmental robotics. We present additional preliminary results from a gridworld abstraction of a robot environment and advocate a layered learning architecture for applying the algorithm on a physically embodied system
Visual Transfer For Reinforcement Learning Via Wasserstein Domain Confusion
We introduce Wasserstein Adversarial Proximal Policy Optimization (WAPPO), a novel algorithm for visual transfer in Reinforcement Learning that explicitly learns to align the distributions of extracted features between a source and target task. WAPPO approximates and minimizes the Wasserstein-1 distance between the distributions of features from source and target domains via a novel Wasserstein Confusion objective. WAPPO outperforms the prior state-of-the-art in visual transfer and successfully transfers policies across Visual Cartpole and both the easy and hard settings of of 16 OpenAI Procgen environments